fix(ai-red-teaming): repair SDK config regression + restore local analytics#34
Merged
Merged
Conversation
…lytics Generated attack workflows failed to configure the Dreadnode SDK and produced no consumable results. Root causes: - The codegen SDK-config block referenced a non-existent `dn.server` module attribute in its fallback path, raising AttributeError that surfaced as a misleading 'FATAL: Could not configure SDK'. - It gated configuration on DREADNODE_SERVER/DREADNODE_API_KEY only, skipping its own working branch even when a valid saved profile or DREADNODE_LLM_* runtime env was present. - Generated scripts never wrote local analytics, so inspect_results / validate_attack_results / get_analytics_summary reported false failures. Fixes: - Defer credential resolution to dn.configure() (explicit > env > profile) and read .server off the returned instance, not the module. - _resolve_platform_env(): also recognize DREADNODE_LLM_BASE/_API_KEY. - Generated workflows now run the SDK's deterministic analyze() over assessment.attack_results and persist a real *_analytics.json (no fabricated metrics) into the workspace dir the tools scan. - results.py: parse the new analytics envelope (ASR from execution_stats.overall_asr, trials from total_trials) and make validate_attack_results / get_analytics_summary platform-aware so platform-only runs are not reported as hard failures.
Add a shared safe_tool wrapper and apply it to all 21 tool entrypoints so any unexpected exception is caught and returned as a clean, user-facing message instead of a raw traceback. Diagnostics go to stderr only. - tools/errors.py: new safe_tool decorator. Wraps sync/async tool fns, preserves name/docstring/signature/annotations (via functools.wraps) so the generated tool schema is unchanged, then applies @tool internally. Loaded by file path because capability tool files are imported as flat modules with no parent package (relative imports are unavailable). - Replace @tool -> @safe_tool across assessment, attacks, goals, results, session, skills_manager, workflows. - Harden previously-unguarded helpers so common recoverable cases degrade gracefully instead of raising: * assessment._load(): tolerate missing/corrupt JSON -> {}. * goals._load_goals(): tolerate missing/unreadable CSV -> []. Verified: all 7 tool modules load under the real flat-module loader and expose all 21 tools; corrupt-file, missing-dataset and forced-exception paths all return clean strings with no traceback.
Patch release covering the SDK-config regression fix, restored local analytics, and the safe_tool error-hardening in this PR.
…from display; add user-POV run sequence Metric clarity: - Present ASR (attack success rate) as the headline success-probability metric (0-100% / 0-1) in get_assessment_status and get_analytics_summary. - Stop surfacing the severity-weighted 0-10 risk score to users. It is computed in the SDK and kept in the raw data / accepted by update_assessment_status for platform parity, but no longer displayed. (True P(success) is ASR; the /10 score is a separate severity measure, so showing both was confusing.) UX: - Greeting now includes a small 5-step user-POV sequence (Plan -> Generate -> Run -> Score -> Report) plus a one-line ASR explanation. - Agent instructed to print a single-line plan before launching a run. Note: this is a presentation-layer change in the capability; the SDK's risk_score computation is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a regression that broke AI red-teaming attack workflows, restores consumable analytics, and hardens every tool so users never see raw tracebacks.
Part 1 — SDK config regression + local analytics
Root causes
dn.serverAttributeError (codegen regression). The generated SDK-config block referenced a non-existentdn.servermodule attribute in its fallback, raisingAttributeErrorsurfaced as a misleadingFATAL: Could not configure SDK.DREADNODE_SERVER/DREADNODE_API_KEY; runtimes injectingDREADNODE_LLM_*(or relying on the saved profile) fell into the broken path even thoughdn.configure()resolves credentials itself.Changes
_build_configure(): defer todn.configure()(explicit > env > saved profile); read.serveroff the returned instance._resolve_platform_env(): also acceptDREADNODE_LLM_BASE/DREADNODE_LLM_API_KEY._build_analytics_writer(): run the SDK's deterministicanalyze()overassessment.attack_resultsand persist a real*_analytics.json(no fabricated metrics) into the workspace dir the tools scan. Wired into all 7 templates.results.py: envelope-aware parsing (ASR fromexecution_stats.overall_asr, trials fromtotal_trials);validate_attack_results/get_analytics_summaryare platform-aware (no hard failure for platform-only runs).Part 2 — Never surface raw tool errors to users
tools/errors.pysafe_toolwrapper: catches any unexpected exception in a tool and returns a clean, user-facing message; raw detail goes to stderr only. Preserves name/docstring/signature/annotations so tool schemas are unchanged. Loaded by file path (capability tool files are flat modules with no parent package).@safe_toolto all 21 tool entrypoints (assessment, attacks, goals, results, session, skills_manager, workflows).assessment._load(): missing/corrupt JSON →{}goals._load_goals(): missing/unreadable CSV →[]Verification
TAP on
groq/meta-llama/llama-4-scout-17b-16e-instruct(attacker = judge = target, 10 iters):SDK configured: server=…(no crash); standalone re-run exits 0.[analytics] wrote local analytics: …/<id>_analytics.json;validate_attack_results→ ✅; summary shows ASR 100%, Risk 8.0/10, 1 high-severity finding, 1 trial.Tool hardening:
PermissionError→ cleansafe_toolmessage, traceback to stderr only. No tracebacks reach the user.All modified files
py_compilecleanly. No behavior change for environments already settingDREADNODE_SERVER/DREADNODE_API_KEY; tool schemas unchanged.